Construcción de modelos para forecast de la temperatura superficial del aire en Brasil - station A601 en el sudeste
Este código prepara el ambiente de trabajo para realizar análisis de datos, visualizaciones y pruebas estadísticas con la serie de tiempo de temperatura en una estación meteorológica del sudoeste de Brasil.
Las librerías cargadas son necesarias para tareas como manipulación de datos, creación de gráficos, modelado estadístico y diagnóstico de modelos. Las configuraciones adicionales aseguran que el código se ejecute sin advertencias molestas y con un estilo visual consistente en los gráficos.
## Se cargan las liberías correspondientes
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
sns.set_style("darkgrid")
import statsmodels.api as sm
from statsmodels.stats.diagnostic import acorr_ljungbox
from IPython.display import display
from statsmodels.sandbox.stats.runs import runstest_1samp
Este código prepara dos estructuras de datos vacías (ResTableVal y ResTableTest) para almacenar información relacionada con modelos, como sus nombres y los valores de residuos o resultados.
Estos DataFrames pueden usarse posteriormente para registrar y analizar los resultados de validación y prueba de diferentes modelos.
# Create an empty DataFrame for Validation and Test
columns = ["ModelName", "ResValues"]
ResTableVal = pd.DataFrame(columns=columns)
ResTableTest = pd.DataFrame(columns=columns)
Se define la función "metricas" para evaluar el rendimiento de un modelo de predicción. Calcula métricas MAE, MSE, RMSE, MAPE y SSE, además de realizar la prueba de Ljung-Box para verificar la autocorrelación en los residuos.
La función retorna un daframe con los valores calculados
### Función para calcular las métricas de los modelos
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error
def metricas(modelo, test, y_pred):
# Calcular el p-valor de Ljung-Box
residuals = test - y_pred
max_lag = min(20, len(residuals) - 1)
ljung_box_results = acorr_ljungbox(residuals, lags=[max_lag], return_df=True)
p_value = ljung_box_results["lb_pvalue"].values[0]
return pd.DataFrame({
"Modelo": [modelo],
"MAE": [mean_absolute_error(test, y_pred)],
"MSE": [mean_squared_error(test, y_pred)],
"RMSE": [np.sqrt(mean_squared_error(test, y_pred))],
"MAPE": [mean_absolute_percentage_error(test, y_pred) * 100],
"SSE": [mean_squared_error(test, y_pred) * len(test)],
"LjB p-value": [p_value]
}, columns=["Modelo", "MAE", "MSE", "RMSE", "MAPE", "SSE", "LjB p-value"])
Este código carga un archivo CSV con datos de temperatura y 25 variables meteorológicas adicionales.
Realiza una limpieza y transformación de los datos (como la conversión de tipos y la eliminación de valores faltantes), y filtra el conjunto de datos para conservar solo las mediciones de temperatura dentro de un rango válido.
El resultado es un DataFrame limpio y listo para su análisis, que contiene únicamente la fecha-hora (Hora-Data) y la temperatura.
## Se cargan y limpian los datos
df = (
pd.read_csv('/content/soudeste_a601.csv')
.rename(columns={
'Data': 'Data',
'Hora': 'Hora',
'TEMPERATURA DO AR - BULBO SECO, HORARIA (°C)': 'Temperature'
})
.assign(
Data=lambda df: pd.to_datetime(df['Data'], format='%Y-%m-%d', errors='coerce'),
Temperature=lambda df: pd.to_numeric(df['Temperature'], errors='coerce'),
Data_Hora=lambda df: pd.to_datetime(df['Data'].astype(str) + ' ' + df['Hora'], format='%Y-%m-%d %H:%M', errors='coerce')
)
.dropna(subset=['Data', 'Hora', 'Temperature'])
)
## Se selecciona únicamente la el tiempo y la variable
data = (df.query("-50 <= Temperature <= 50")
.filter(items=["Data_Hora", "Temperature"])
)
La función detect_outliers_iqr se utiliza para identificar valores atípicos en un conjunto de datos utilizando el rango intercuartílico (IQR).
Devuelve una un valor booleano (1:Outlier, 0:No outlier) que indica qué valores son considerados outliers según los límites definidos por el IQR.
def detect_outliers_iqr(data, threshold=1.5):
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - threshold * IQR
upper_bound = Q3 + threshold * IQR
outliers = (data < lower_bound) | (data > upper_bound)
return outliers
Este código detecta valores atípicos (outliers) en la columna 'Temperature' del DataFrame data utilizando la función detect_outliers_iqr.
Luego, cuenta y muestra la cantidad de outliers encontrados, es decir, el número de valores True en la máscara booleana generada.
Finalmente, imprime el resultado con el mensaje: "Number of outliers: X", donde X es la cantidad de outliers.
# prompt: detect_outliers_iqr(data['Temperature']) y suma la cantidad de datos True que hayan
outliers = detect_outliers_iqr(data['Temperature'])
outlier_count = outliers.sum()
print(f"Number of outliers: {outlier_count}")
Number of outliers: 1721
La suavización exponencial simple pondera los datos históricos de forma exponencial, dando más peso a los datos más recientes y menos peso a los datos más antiguos. Esto se hace mediante un factor de suavización ($λ$), que es un valor entre 0 y 1. Cuanto más cerca esté alfa de 1, más peso se dará a los datos recientes.
Se convierten los datos data a un objeto Panda series y se pone como índice la variable que contiene la fecha y hora, (Data_Hora)
## Se convierten los datos _data_ a un objeto time serie
data_ts = pd.Series(data=data['Temperature'].values, index=data['Data_Hora'])
data_ts.head()
data_ori = data_ts
Este código genera dos gráficos: uno de autocorrelación y otro de autocorrelación parcial para una serie temporal (data_ts).
Utiliza plot_acf y plot_pacf de la librería statsmodels para visualizar la dependencia temporal en los datos, con un rango de 48 lags.
Los gráficos se muestran en una figura con dos subplots, titulados "Autocorrelación" y "Autocorrelación Parcial", respectivamente.
Finalmente, se ajusta el diseño y se muestra la figura con plt.show().
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
# Autocorrelation on the left
plot_acf(data_ts, lags=48, ax=axes[0])
axes[0].set_title('Autocorrelación')
# Partial Autocorrelation on the right
plot_pacf(data_ts, lags=48, ax=axes[1])
axes[1].set_title('Autocorrelación Parcial')
plt.tight_layout()
plt.show()
Este código genera un gráfico de línea de la serie temporal data_ts, que representa la temperatura superficial en la estación Soudeste Brasil A601 desde mayo de 2000 hasta abril de 2021.
Además, se etiquetan los ejes: el eje X representa el tiempo en frecuencia horaria, y el eje Y representa la temperatura en grados Celsius.
plt.figure(figsize=(15, 5))
data_ts.plot(color='b')
plt.title('Temperatura superficial - Soudeste Brasil A601 (Mayo 2000 - Abril 2021)')
plt.xlabel('Tiempo - Frecuencia horaria')
plt.ylabel('Temperature - Cº');
A continuación el código Python para generar el modelo SES
Este código implementa un modelo de suavización exponencial simple (SES) de primer orden para realizar pronósticos en una serie temporal de temperatura.
Se basa en una función de suavización tomada de las que se realizaron en clase.
Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).
Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.
Finalmente, genera gráficos de los pronósticos (incluyendo el resultado de las pruebas de normalidad y aleatoriedad), residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - SES with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
import scipy.stats as stats
from statsmodels.sandbox.stats.runs import runstest_1samp # For Wald-Wolfowitz test
# First-order exponential smoothing function
def firstsmooth(y, lambda_, start=None):
"""
First-order exponential smoothing.
Parameters:
y (pd.Series or np.array): Input time series.
lambda_ (float): Smoothing parameter (0 < lambda_ < 1).
start (float): Initial value for smoothing. If None, uses y[0].
Returns:
pd.Series: Smoothed values.
"""
ytilde = y.copy()
if start is None:
start = y[0]
ytilde[0] = lambda_ * y[0] + (1 - lambda_) * start
for i in range(1, len(y)):
ytilde[i] = lambda_ * y[i] + (1 - lambda_) * ytilde[i - 1]
return ytilde
# Function to forecast using first-order exponential smoothing
def forecast_firstsmooth(train, steps, lambda_):
"""
Forecast future values using first-order exponential smoothing.
Parameters:
train (pd.Series): Training data.
steps (int): Number of steps to forecast.
lambda_ (float): Smoothing parameter (0 < lambda_ < 1).
Returns:
pd.Series: Forecasted values.
"""
# Fit the model on the training data
smoothed_train = firstsmooth(train, lambda_)
# The last smoothed value is the starting point for forecasting
last_smoothed = smoothed_train.iloc[-1]
# Forecast future values
forecast = np.zeros(steps)
forecast[0] = lambda_ * train.iloc[-1] + (1 - lambda_) * last_smoothed
for i in range(1, steps):
forecast[i] = lambda_ * forecast[i - 1] + (1 - lambda_) * forecast[i - 1]
# Convert forecast to a pandas Series with appropriate index
forecast_index = pd.date_range(start=train.index[-1], periods=steps + 1, freq='H')[1:]
return pd.Series(forecast, index=forecast_index)
nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Test: Last 24 values
test = data_ts.iloc[-test_size:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Rolling Forecast with a 4-hour window
lambda_ = 0.5 # Smoothing parameter
window_size = 1 # Rolling window size in hours
# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []
# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
# Update training data with the most recent window
current_train = pd.concat([train, validation.iloc[:i]])
# Forecast for the next window
current_forecast = forecast_firstsmooth(current_train, steps=window_size, lambda_=lambda_)
# Append forecasts and actuals for Validation
val_rolling_forecasts.extend(current_forecast.values)
val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])
# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation]) # Combine training and validation data
for i in range(0, len(test), window_size):
# Update training data with the most recent window
current_train = pd.concat([combined_train_val, test.iloc[:i]])
# Forecast for the next window
current_forecast = forecast_firstsmooth(current_train, steps=window_size, lambda_=lambda_)
# Append forecasts and actuals for Test
test_rolling_forecasts.extend(current_forecast.values)
test_rolling_actuals.extend(test.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])
# 3. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, and MSE.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
return metrics
# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts
# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts
# Calculate Ljung-Box p-value for rolling residuals
val_rolling_ljungbox = acorr_ljungbox(val_rolling_residuals, lags=10, return_df=True)
val_rolling_metrics['Ljung-Box p-value'] = val_rolling_ljungbox['lb_pvalue'].values[0]
test_rolling_ljungbox = acorr_ljungbox(test_rolling_residuals, lags=10, return_df=True)
test_rolling_metrics['Ljung-Box p-value'] = test_rolling_ljungbox['lb_pvalue'].values[0]
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)
# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')
# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 4. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'Rolling First-Order Exponential Smoothing Forecast (Lambda={lambda_})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 5. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 6. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: SES
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[0] = {"ModelName": "SES", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[0] = {"ModelName": "SES", "ResValues": tempSeries.values}
Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 3.9233 | 0.8110 | 1.0493 | 1.1010 | 0.0006 | 0.0080 | 0.1180 |
| Test | 4.1570 | 0.8646 | 1.1998 | 1.4395 | 0.0001 | 0.1734 | 0.6289 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Not Random | Normal |
| Test | Random | Normal |
Este código implementa un modelo de suavización exponencial de segundo orden (DES) para realizar pronósticos en una serie temporal de temperatura.
Se basa en una función de suavización tomada de las que se realizaron en clase que para la suavización de segundo orden se aplica recursivamente.
Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).
Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.
Finalmente, genera gráficos de los pronósticos (incluyendo el resultado de las pruebas de normalidad y aleatoriedad), residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - DES with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Second-order exponential smoothing function
def secondsmooth(y, alpha, beta, start_level=None, start_trend=None):
"""
Second-order exponential smoothing (Holt's linear trend method).
Parameters:
y (pd.Series or np.array): Input time series.
alpha (float): Smoothing parameter for level (0 < alpha < 1).
beta (float): Smoothing parameter for trend (0 < beta < 1).
start_level (float): Initial level value. If None, uses y[0].
start_trend (float): Initial trend value. If None, uses y[1] - y[0].
Returns:
pd.Series: Smoothed values.
"""
ytilde = np.zeros_like(y)
if start_level is None:
start_level = y[0]
if start_trend is None:
start_trend = y[1] - y[0]
level = start_level
trend = start_trend
for i in range(len(y)):
ytilde[i] = level + trend
if i < len(y) - 1:
new_level = alpha * y[i] + (1 - alpha) * (level + trend)
new_trend = beta * (new_level - level) + (1 - beta) * trend
level, trend = new_level, new_trend
return pd.Series(ytilde, index=y.index)
# Function to forecast using second-order exponential smoothing
def forecast_secondsmooth(train, steps, alpha, beta):
"""
Forecast future values using second-order exponential smoothing.
Parameters:
train (pd.Series): Training data.
steps (int): Number of steps to forecast.
alpha (float): Smoothing parameter for level (0 < alpha < 1).
beta (float): Smoothing parameter for trend (0 < beta < 1).
Returns:
pd.Series: Forecasted values.
"""
# Fit the model on the training data
smoothed_train = secondsmooth(train, alpha, beta)
# Extract the last level and trend
last_level = alpha * train.iloc[-1] + (1 - alpha) * (smoothed_train.iloc[-2] + (smoothed_train.iloc[-1] - smoothed_train.iloc[-2]))
last_trend = beta * (last_level - smoothed_train.iloc[-2]) + (1 - beta) * (smoothed_train.iloc[-1] - smoothed_train.iloc[-2])
# Forecast future values
forecast = np.zeros(steps)
forecast[0] = last_level + last_trend
for i in range(1, steps):
forecast[i] = forecast[i - 1] + last_trend
# Convert forecast to a pandas Series with appropriate index
forecast_index = pd.date_range(start=train.index[-1], periods=steps + 1, freq='H')[1:]
return pd.Series(forecast, index=forecast_index)
nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Test: Last 24 values
test = data_ts.iloc[-test_size:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Rolling Forecast with a 4-hour window
alpha = 0.5 # Smoothing parameter for level
beta = 0.3 # Smoothing parameter for trend
window_size = 1 # Rolling window size in hours
# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []
# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
# Update training data with the most recent window
current_train = pd.concat([train, validation.iloc[:i]])
# Forecast for the next window
current_forecast = forecast_secondsmooth(current_train, steps=window_size, alpha=alpha, beta=beta)
# Append forecasts and actuals for Validation
val_rolling_forecasts.extend(current_forecast.values)
val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])
# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation]) # Combine training and validation data
for i in range(0, len(test), window_size):
# Update training data with the most recent window
current_train = pd.concat([combined_train_val, test.iloc[:i]])
# Forecast for the next window
current_forecast = forecast_secondsmooth(current_train, steps=window_size, alpha=alpha, beta=beta)
# Append forecasts and actuals for Test
test_rolling_forecasts.extend(current_forecast.values)
test_rolling_actuals.extend(test.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])
# 3. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, and MSE.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
return metrics
# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts
# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts
# Calculate Ljung-Box p-value for rolling residuals
val_rolling_ljungbox = acorr_ljungbox(val_rolling_residuals, lags=10, return_df=True)
val_rolling_metrics['Ljung-Box p-value'] = val_rolling_ljungbox['lb_pvalue'].values[0]
test_rolling_ljungbox = acorr_ljungbox(test_rolling_residuals, lags=10, return_df=True)
test_rolling_metrics['Ljung-Box p-value'] = test_rolling_ljungbox['lb_pvalue'].values[0]
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)
# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')
# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 4. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'Rolling Second-Order Exponential Smoothing Forecast (Alpha={alpha}, Beta={beta})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 5. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 6. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: DES
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[1] = {"ModelName": "DES", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[1] = {"ModelName": "DES", "ResValues": tempSeries.values}
Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 4.3579 | 0.9152 | 1.1965 | 1.4316 | 0.0018 | 0.0263 | 0.8145 |
| Test | 5.4639 | 1.1326 | 1.3748 | 1.8901 | 0.0004 | 0.0228 | 0.7360 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Not Random | Normal |
| Test | Not Random | Normal |
Este código implementa un modelo de suavización exponencial simple de Holt-Winters (SES-HW) utilizando la librería statsmodels para realizar pronósticos en la serie de tiempo de temperatura en la estación A610 del sudoeste de Brasil
Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).
Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.
Genera gráficos de los pronósticos, residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.
Finalmente, genera gráficos de los pronósticos (incluyendo el resultado de las pruebas de normalidad y aleatoriedad), residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - SES-HW with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Test: Last 24 values
test = data_ts.iloc[-test_size:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Create a first-order SES model
def ses_forecast(train, steps, alpha=0.5):
"""
Simple Exponential Smoothing (SES) forecast.
Parameters:
train (pd.Series): Training data.
steps (int): Number of steps to forecast.
alpha (float): Smoothing parameter (0 < alpha < 1).
Returns:
pd.Series: Forecasted values.
"""
model = SimpleExpSmoothing(train)
model_fit = model.fit(smoothing_level=alpha, optimized=False)
forecast = model_fit.forecast(steps=steps)
return forecast, model_fit
# 3. Rolling Forecast with a 4-hour window
alpha = 0.5 # Smoothing parameter
window_size = 1 # Rolling window size in hours
# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []
# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
# Update training data with the most recent window
current_train = pd.concat([train, validation.iloc[:i]])
# Forecast for the next window
current_forecast, _ = ses_forecast(current_train, steps=window_size, alpha=alpha)
# Append forecasts and actuals for Validation
val_rolling_forecasts.extend(current_forecast.values)
val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])
# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation]) # Combine training and validation data
for i in range(0, len(test), window_size):
# Update training data with the most recent window
current_train = pd.concat([combined_train_val, test.iloc[:i]])
# Forecast for the next window
current_forecast, _ = ses_forecast(current_train, steps=window_size, alpha=alpha)
# Append forecasts and actuals for Test
test_rolling_forecasts.extend(current_forecast.values)
test_rolling_actuals.extend(test.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])
# 4. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
return metrics
# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts
val_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_rolling_residuals, lags=10, return_df=True)['lb_pvalue'].values[0]
# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts
test_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_rolling_residuals, lags=10, return_df=True)['lb_pvalue'].values[0]
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)
# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')
# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 5. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'SES Rolling Forecast (Alpha={alpha})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 6. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
# axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: SES-HW
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[2] = {"ModelName": "SES-HW", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[2] = {"ModelName": "SES-HW", "ResValues": tempSeries.values}
Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 4.9992 | 1.0321 | 1.2778 | 1.6328 | 0.0001 | 0.0026 | 0.0457 |
| Test | 5.1689 | 1.0843 | 1.4966 | 2.2397 | 0.0000 | 0.0026 | 0.5640 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Not Random | Not Normal |
| Test | Not Random | Normal |
A continuación el código Python para generar el modelo SES-HW
A continuación el código Python para generar el modelo DES-HW
Este código implementa un modelo de suavización exponencial doble de Holt-Winters (DES-HW) utilizando la librería statsmodels para realizar pronósticos en la serie de tiempo de la temperatura medida en la estación meteorológica A601 del sudoeste de Brasil.
Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).
Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.
Finalmente, genera gráficos de los pronósticos, residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - DES-HW with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Test: Last 24 values
test = data_ts.iloc[-test_size:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Create a Double Exponential Smoothing (DES) model
def des_forecast(train, steps, alpha=0.5, beta=0.5):
"""
Double Exponential Smoothing (DES) forecast.
Parameters:
train (pd.Series): Training data.
steps (int): Number of steps to forecast.
alpha (float): Smoothing parameter for level (0 < alpha < 1).
beta (float): Smoothing parameter for trend (0 < beta < 1).
Returns:
pd.Series: Forecasted values.
"""
model = ExponentialSmoothing(train, trend='add', seasonal=None)
model_fit = model.fit(smoothing_level=alpha, smoothing_trend=beta, optimized=False)
forecast = model_fit.forecast(steps=steps)
return forecast, model_fit
# 3. Rolling Forecast with a 4-hour window
alpha = 0.5 # Smoothing parameter for level
beta = 0.5 # Smoothing parameter for trend
window_size = 1 # Rolling window size in hours
# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []
# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
# Update training data with the most recent window
current_train = pd.concat([train, validation.iloc[:i]])
# Forecast for the next window
current_forecast, _ = des_forecast(current_train, steps=window_size, alpha=alpha, beta=beta)
# Append forecasts and actuals for Validation
val_rolling_forecasts.extend(current_forecast.values)
val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])
# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation]) # Combine training and validation data
for i in range(0, len(test), window_size):
# Update training data with the most recent window
current_train = pd.concat([combined_train_val, test.iloc[:i]])
# Forecast for the next window
current_forecast, _ = des_forecast(current_train, steps=window_size, alpha=alpha, beta=beta)
# Append forecasts and actuals for Test
test_rolling_forecasts.extend(current_forecast.values)
test_rolling_actuals.extend(test.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])
# 4. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
return metrics
# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts
val_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_rolling_residuals, lags=10, return_df=True)['lb_pvalue'].values[0]
# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts
test_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_rolling_residuals, lags=10, return_df=True)['lb_pvalue'].values[0]
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)
# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')
# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 5. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'DES Rolling Forecast (Alpha={alpha}, Beta={beta})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 6. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: DES-HW
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[3] = {"ModelName": "DES-HW", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[3] = {"ModelName": "DES-HW", "ResValues": tempSeries.values}
Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 4.5535 | 0.9613 | 1.2635 | 1.5964 | 0.0006 | 0.007 | 0.8043 |
| Test | 5.5504 | 1.1652 | 1.4165 | 2.0066 | 0.0001 | 0.008 | 0.3423 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Not Random | Normal |
| Test | Not Random | Normal |
Método de promedio móvil: se elige un ancho de ventana específico y se toma el promedio de los puntos de datos dentro de esta ventana para formar un nuevo punto de datos. Luego, esta ventana se desliza a lo largo del conjunto de datos para calcular los promedios móviles para todo el conjunto.
Método de suavizado exponencial: se formulan nuevas predicciones tomando el promedio ponderado de los puntos de datos pasados. En este método, los datos más recientes reciben un peso mayor, mientras que los datos más antiguos reciben un peso menor. Esto se basa en el supuesto de que los datos están más relacionados con su pasado reciente.
Otras divisiones en el suavizado exponencial: el método de suavizado exponencial se divide a su vez en suavizado exponencial simple, doble y triple. Estos métodos se utilizan para modelar diferentes componentes (nivel, tendencia y estacionalidad) en el conjunto de datos.
Se realiza una separación de los datos en un conjunto de entrenamiento y test, este último, eventualmente será comparado con la predicción.
Este código define una función ts_decompose que realiza la descomposición de una serie de tiempo en sus componentes principales: tendencia, estacionalidad y residuos.
Utiliza la función seasonal_decompose de statsmodels para descomponer la serie en un modelo aditivo o multiplicativo.
Luego, genera un gráfico de 4 subplots que muestran la serie original, la tendencia, la estacionalidad y los residuos, junto con sus medias.
La función también permite especificar el período de estacionalidad y suprime advertencias para una salida más limpia.
### Se cargan las liberías
import itertools
import warnings
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from statsmodels.tsa.seasonal import seasonal_decompose
import statsmodels.tsa.api as smt
warnings.filterwarnings('ignore')
def ts_decompose(y, model="additive", stationary=False, period=24): # Added period argument
result = seasonal_decompose(y, model=model, period=period) # Pass period to seasonal_decompose
fig, axes = plt.subplots(4, 1, sharex=True, sharey=False)
fig.set_figheight(10)
fig.set_figwidth(15)
axes[0].set_title("Decomposition for " + model + " model")
axes[0].plot(y, label='Original ' + model)
axes[0].legend(loc='upper left')
axes[1].plot(result.trend, label='Trend')
axes[1].legend(loc='upper left')
axes[2].plot(result.seasonal, label='Seasonality & Mean: ' + str(round(result.seasonal.mean(), 4)))
axes[2].legend(loc='upper left')
axes[3].plot(result.resid, label='Residuals & Mean: ' + str(round(result.resid.mean(), 4)))
axes[3].legend(loc='upper left')
plt.show(block=True)
ts_decompose(data_ts, stationary=True)
Con base en el resultado de la función, se modelaron con éxito los componentes 'Nivel', 'Tendencia' y 'Estacionalidad' de la serie. La distribución de los residuos se ilustra en un gráfico. Al observar estos residuos, podemos ver que se distribuyen alrededor de cero y no siguen una tendencia definida. Por lo tanto, podemos inferir que la serie tiene una característica 'aditiva'.
Este código implementa un modelo de suavización exponencial de tercer orden de Holt-Winters (DES-HW) utilizando la librería statsmodels para realizar pronósticos en la serie de tiempo de la temperatura medida en la estación meteorológica A601 del sudoeste de Brasil.
Divide los datos en conjuntos de entrenamiento, validación y prueba, y realiza pronósticos en ventanas móviles (rolling forecasts).
Evalúa el rendimiento del modelo mediante métricas como MAPE, MAE, RMSE y MSE, y realiza pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.
Finalmente, genera gráficos de los pronósticos, residuos y análisis de autocorrelación, y almacena los residuos en tablas para su posterior uso.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - TES-HW with Rolling Improvement for Validation and Test Sets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
nHours = 24
nDaysYear = 365
NoYears = 3
backHorizon = nHours * nDaysYear * NoYears
data_ts = data_ori
data_ts = data_ts[-backHorizon:]
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Test: Last 24 values
test = data_ts.iloc[-test_size:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Create a Third-Order Holt-Winters Exponential Smoothing model
def hw_forecast(train, steps, seasonal_periods=24, alpha=0.5, beta=0.5, gamma=0.5, seasonal='add'):
"""
Third-Order Holt-Winters Exponential Smoothing forecast.
Parameters:
train (pd.Series): Training data.
steps (int): Number of steps to forecast.
seasonal_periods (int): Number of periods in a season.
alpha (float): Smoothing parameter for level (0 < alpha < 1).
beta (float): Smoothing parameter for trend (0 < beta < 1).
gamma (float): Smoothing parameter for seasonality (0 < gamma < 1).
seasonal (str): Type of seasonality ('add' or 'mul').
Returns:
pd.Series: Forecasted values.
"""
model = ExponentialSmoothing(
train,
trend='add',
seasonal=seasonal,
seasonal_periods=seasonal_periods
)
model_fit = model.fit(
smoothing_level=alpha,
smoothing_trend=beta,
smoothing_seasonal=gamma,
optimized=False
)
forecast = model_fit.forecast(steps=steps)
return forecast, model_fit
# 3. Rolling Forecast with a 4-hour window
alpha = 0.3 # Smoothing parameter for level
beta = 0.1 # Smoothing parameter for trend
gamma = 0.3 # Smoothing parameter for seasonality
seasonal_periods = 24 # Assuming hourly data with daily seasonality (24 hours)
window_size = 1 # Rolling window size in hours
# Initialize lists to store rolling forecasts and actuals for Validation and Test
val_rolling_forecasts = []
val_rolling_actuals = []
test_rolling_forecasts = []
test_rolling_actuals = []
# Perform rolling forecasts for Validation set
for i in range(0, len(validation), window_size):
# Update training data with the most recent window
current_train = pd.concat([train, validation.iloc[:i]])
# Forecast for the next window
current_forecast, _ = hw_forecast(current_train, steps=window_size, alpha=alpha, beta=beta, gamma=gamma, seasonal='add')
# Append forecasts and actuals for Validation
val_rolling_forecasts.extend(current_forecast.values)
val_rolling_actuals.extend(validation.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Validation
val_rolling_forecasts = pd.Series(val_rolling_forecasts, index=validation.index[:len(val_rolling_forecasts)])
val_rolling_actuals = pd.Series(val_rolling_actuals, index=validation.index[:len(val_rolling_actuals)])
# Perform rolling forecasts for Test set
combined_train_val = pd.concat([train, validation]) # Combine training and validation data
for i in range(0, len(test), window_size):
# Update training data with the most recent window
current_train = pd.concat([combined_train_val, test.iloc[:i]])
# Forecast for the next window
current_forecast, _ = hw_forecast(current_train, steps=window_size, alpha=alpha, beta=beta, gamma=gamma, seasonal='add')
# Append forecasts and actuals for Test
test_rolling_forecasts.extend(current_forecast.values)
test_rolling_actuals.extend(test.iloc[i:i + window_size].values)
# Convert lists to pandas Series for Test
test_rolling_forecasts = pd.Series(test_rolling_forecasts, index=test.index[:len(test_rolling_forecasts)])
test_rolling_actuals = pd.Series(test_rolling_actuals, index=test.index[:len(test_rolling_actuals)])
# 4. Calculate metrics for rolling forecasts
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
# Align indices and drop rows with NaN values
aligned_actual, aligned_forecast = actual.align(forecast, join='inner')
aligned_actual = aligned_actual.dropna()
aligned_forecast = aligned_forecast.dropna()
# Avoid division by zero
aligned_actual = aligned_actual.replace(0, 1e-9)
# Calculate metrics
metrics = {
'MAPE': np.mean(np.abs((aligned_actual - aligned_forecast) / aligned_actual)) * 100,
'MAE': mean_absolute_error(aligned_actual, aligned_forecast),
'RMSE': np.sqrt(mean_squared_error(aligned_actual, aligned_forecast)),
'MSE': mean_squared_error(aligned_actual, aligned_forecast),
}
return metrics
# Rolling forecast metrics for Validation
val_rolling_metrics = calculate_metrics(val_rolling_actuals, val_rolling_forecasts)
val_rolling_residuals = val_rolling_actuals - val_rolling_forecasts
# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_rolling_residuals.dropna()
if len(val_residuals_cleaned) > 0:
val_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
val_rolling_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Rolling forecast metrics for Test
test_rolling_metrics = calculate_metrics(test_rolling_actuals, test_rolling_forecasts)
test_rolling_residuals = test_rolling_actuals - test_rolling_forecasts
# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_rolling_residuals.dropna()
if len(test_residuals_cleaned) > 0:
test_rolling_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
test_rolling_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_rolling_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_rolling_residuals)
# Add randomness test results to metrics
val_rolling_metrics['Randomness p-value'] = val_runs_pvalue
test_rolling_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_rolling_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_rolling_residuals, 'norm')
# Add normality test results to metrics
val_rolling_metrics['KS-Normality p-value'] = val_ks_pvalue
test_rolling_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_rolling_metrics, test_rolling_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 5. Plot Training, Validation, Test, and Rolling Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_rolling_forecasts.index, val_rolling_forecasts, label='Validation Rolling Forecast', color='orange', linestyle='--')
plt.plot(test_rolling_forecasts.index, test_rolling_forecasts, label='Test Rolling Forecast', color='purple', linestyle='--')
plt.title(f'Holt-Winters Rolling Forecast (Alpha={alpha}, Beta={beta}, Gamma={gamma})\nValidation MAE: {val_rolling_metrics["MAE"]:.2f}, Test MAE: {test_rolling_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 6. Residual Analysis: Validation and Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_rolling_residuals.index, val_rolling_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_rolling_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
plot_acf(val_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_rolling_residuals.index, test_rolling_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
# axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_rolling_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_rolling_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_rolling_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
plot_acf(test_rolling_residuals, lags=20, ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: TES-HW
tempSeries = pd.Series(val_rolling_residuals)
ResTableVal.loc[4] = {"ModelName": "TES-HW", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_rolling_residuals)
ResTableTest.loc[4] = {"ModelName": "TES-HW", "ResValues": tempSeries.values}
Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 5.1354 | 1.0544 | 1.4955 | 2.2364 | 0.0001 | 0.008 | 0.6577 |
| Test | 3.7055 | 0.7754 | 0.9829 | 0.9661 | 0.0002 | 0.094 | 0.2734 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Not Random | Normal |
| Test | Random | Normal |
A continuación el código Python para generar el modelo RNN
Este código implementa un modelo de Red Neuronal Recurrente (RNN) para pronósticos en la serie de tiempos de temparatura en la estación A601 del sudoeste de Brasil. Se basa en la librería statsmodels.
Divide los datos en conjuntos de entrenamiento, validación y prueba, y utiliza una ventana de tiempo (look_back) para predecir el siguiente valor.
El modelo se entrena con datos normalizados y se evalúa mediante métricas como MAPE, MAE, RMSE y MSE. Además, se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.
Se generan gráficos de los pronósticos, residuos y análisis de autocorrelación, y se almacenan los residuos en tablas para su posterior uso.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - RNN
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
# 0. Set data_ts with the time series original values
data_ts = data_ori
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Test: Last 24 values
test = data_ts.iloc[-test_size-6:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Prepare data for RNN
def create_dataset(data, look_back=6):
X, y = [], []
if len(data) > look_back:
for i in range(len(data) - look_back):
X.append(data.iloc[i:i + look_back].values)
y.append(data.iloc[i + look_back])
return np.array(X), np.array(y)
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))
# Create datasets
look_back = 6 # Use the last 24 hours to predict the next hour
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)
# Reshape input to be [samples, time steps, features]
print(X_val.shape)
print(X_test.shape)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_val = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
# 3. Build the RNN model
def build_rnn_model(input_shape):
"""
Build an RNN model.
Parameters:
input_shape (tuple): Shape of the input data (time steps, features).
Returns:
model: Compiled RNN model.
"""
model = Sequential()
model.add(SimpleRNN(50, input_shape=input_shape, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
return model
# Build and train the model
model = build_rnn_model((X_train.shape[1], X_train.shape[2]))
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=1)
# 4. Forecast for validation and test
def forecast_rnn(model, data, look_back):
"""
Forecast future values using the RNN model.
Parameters:
model: Trained RNN model.
data (np.array): Input data.
look_back (int): Number of previous time steps to use as input.
Returns:
forecast: Forecasted values.
"""
forecast = []
for i in range(len(data) - look_back):
X = data[i:i + look_back].reshape(1, look_back, 1)
y_pred = model.predict(X, verbose=0)
forecast.append(y_pred[0, 0])
return np.array(forecast)
# Forecast for validation
val_forecast_scaled = forecast_rnn(model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_forecast = pd.Series(val_forecast, index=validation.index[look_back:])
# Forecast for test
test_forecast_scaled = forecast_rnn(model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_forecast = pd.Series(test_forecast, index=test.index[look_back:])
# 5. Calculate metrics
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
return metrics
# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_forecast)
val_residuals = validation.iloc[look_back:] - val_forecast
# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
val_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_forecast)
test_residuals = test.iloc[look_back:] - test_forecast
# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
test_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)
# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')
# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 6. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_forecast.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test_forecast.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'RNN Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
plot_acf(val_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
# plot_acf(val_residuals, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
# axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 8. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
plot_acf(test_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: RNN
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[5] = {"ModelName": "RNN", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[5] = {"ModelName": "RNN", "ResValues": tempSeries.values}
(24, 6) (24, 6) Epoch 1/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 14s 2ms/step - loss: 0.0027 - val_loss: 5.2126e-04 Epoch 2/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 21s 2ms/step - loss: 0.0010 - val_loss: 7.7277e-04 Epoch 3/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 13s 2ms/step - loss: 0.0011 - val_loss: 5.4140e-04 Epoch 4/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 14s 3ms/step - loss: 0.0010 - val_loss: 5.4864e-04 Epoch 5/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 19s 2ms/step - loss: 9.9340e-04 - val_loss: 6.6526e-04 Epoch 6/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 13s 2ms/step - loss: 9.9997e-04 - val_loss: 5.6443e-04 Epoch 7/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 12s 2ms/step - loss: 0.0010 - val_loss: 5.8493e-04 Epoch 8/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 13s 2ms/step - loss: 9.9621e-04 - val_loss: 5.9492e-04 Epoch 9/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 13s 2ms/step - loss: 0.0010 - val_loss: 5.6386e-04 Epoch 10/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 20s 2ms/step - loss: 9.7589e-04 - val_loss: 5.4842e-04 Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 2.9711 | 0.6226 | 0.7658 | 0.5864 | 0.4127 | 0.5312 | 0.6104 |
| Test | 2.8239 | 0.5768 | 0.7151 | 0.5113 | 0.1757 | 0.8347 | 0.2599 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Random | Normal |
| Test | Random | Normal |
A continuación el código Python para generar el modelo MLP
Este código implementa un modelo de Perceptrón Multicapa (MLP) para pronósticos en de la serie de tiempos referente a las mediciones hechas en la estación A601 del sudoeste de Brasil.
Divide los datos en conjuntos de entrenamiento, validación y prueba, y utiliza una ventana de tiempo (look_back) para predecir el siguiente valor.
El modelo se entrena con datos normalizados y se evalúa mediante métricas como MAPE, MAE, RMSE y MSE. Además, se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.
Se generan gráficos de los pronósticos, residuos y análisis de autocorrelación, y se almacenan los residuos en tablas para su posterior uso.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - MLP
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# 0. Set the original data in data_ts
data_ts = data_ori
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Test: Last 24 values
test = data_ts.iloc[-test_size-6:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Prepare data for MLP
def create_dataset(data, look_back=24):
"""
Create sequences of data for MLP input.
Parameters:
data (pd.Series): Input time series.
look_back (int): Number of previous time steps to use as input.
Returns:
X, y: Input features and target values.
"""
X, y = [], []
for i in range(len(data) - look_back):
X.append(data.iloc[i:i + look_back].values)
y.append(data.iloc[i + look_back])
return np.array(X), np.array(y)
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))
# Create datasets
look_back = 6 # Use the last 24 hours to predict the next hour
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)
# Reshape input to be [samples, features] for MLP
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1])
X_val = X_val.reshape(X_val.shape[0], X_val.shape[1])
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1])
# 3. Build the MLP model
def build_mlp_model(input_shape):
"""
Build an MLP model.
Parameters:
input_shape (int): Number of input features.
Returns:
model: Compiled MLP model.
"""
model = Sequential()
model.add(Dense(50, input_dim=input_shape, activation='relu')) # First hidden layer
model.add(Dense(25, activation='relu')) # Second hidden layer
model.add(Dense(1)) # Output layer
model.compile(optimizer='adam', loss='mse')
return model
# Build and train the model
model = build_mlp_model(X_train.shape[1])
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=1)
# 4. Forecast for validation and test
def forecast_mlp(model, data, look_back):
"""
Forecast future values using the MLP model.
Parameters:
model: Trained MLP model.
data (np.array): Input data.
look_back (int): Number of previous time steps to use as input.
Returns:
forecast: Forecasted values.
"""
forecast = []
for i in range(len(data) - look_back):
X = data[i:i + look_back].reshape(1, look_back)
y_pred = model.predict(X, verbose=0)
forecast.append(y_pred[0, 0])
return np.array(forecast)
# Forecast for validation
val_forecast_scaled = forecast_mlp(model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_forecast = pd.Series(val_forecast, index=validation.index[look_back:])
# Forecast for test
test_forecast_scaled = forecast_mlp(model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_forecast = pd.Series(test_forecast, index=test.index[look_back:])
# 5. Calculate metrics
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
return metrics
# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_forecast)
val_residuals = validation.iloc[look_back:] - val_forecast
# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
val_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_forecast)
test_residuals = test.iloc[look_back:] - test_forecast
# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
test_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)
# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')
# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 6. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_forecast.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test_forecast.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'MLP Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
plot_acf(val_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 8. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
plot_acf(test_residuals_cleaned, lags=min(20, len(test_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: MLP
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[6] = {"ModelName": "MLP", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[6] = {"ModelName": "MLP", "ResValues": tempSeries.values}
Epoch 1/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 9s 2ms/step - loss: 0.0077 - val_loss: 5.1756e-04 Epoch 2/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 10s 2ms/step - loss: 0.0011 - val_loss: 5.7394e-04 Epoch 3/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 7s 1ms/step - loss: 0.0010 - val_loss: 7.4281e-04 Epoch 4/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 11s 1ms/step - loss: 0.0010 - val_loss: 5.4199e-04 Epoch 5/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 8s 1ms/step - loss: 0.0010 - val_loss: 6.0700e-04 Epoch 6/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 10s 1ms/step - loss: 0.0010 - val_loss: 5.6668e-04 Epoch 7/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 8s 1ms/step - loss: 0.0010 - val_loss: 5.5554e-04 Epoch 8/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 11s 2ms/step - loss: 0.0010 - val_loss: 5.7778e-04 Epoch 9/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 8s 1ms/step - loss: 0.0010 - val_loss: 6.5952e-04 Epoch 10/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 8s 1ms/step - loss: 0.0010 - val_loss: 7.1140e-04 Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 3.3882 | 0.7041 | 0.8722 | 0.7607 | 0.6299 | 0.5312 | 0.0298 |
| Test | 3.1002 | 0.6255 | 0.7725 | 0.5967 | 0.2098 | 0.8347 | 0.0318 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Random | Not Normal |
| Test | Random | Not Normal |
A continuación el código Python para generar el modelo LSTM
Este código implementa un modelo de Long Short-Term Memory (LSTM) para pronósticos referente a las mediciones hechas en la estación A601 del sudoeste de Brasil.
Divide los datos en conjuntos de entrenamiento, validación y prueba, y utiliza una ventana de tiempo (look_back) para predecir el siguiente valor.
El modelo se entrena con datos normalizados y se evalúa mediante métricas como MAPE, MAE, RMSE y MSE. Además, se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos.
Finalmente, se generan gráficos de los pronósticos, residuos y análisis de autocorrelación, y se almacenan los residuos en tablas para su posterior uso.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - LSTM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
# 0. Set the original data in data_ts
data_ts = data_ori
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Test: Last 24 values
test = data_ts.iloc[-test_size-6:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Prepare data for LSTM
def create_dataset(data, look_back=24):
"""
Create sequences of data for LSTM input.
Parameters:
data (pd.Series): Input time series.
look_back (int): Number of previous time steps to use as input.
Returns:
X, y: Input features and target values.
"""
X, y = [], []
for i in range(len(data) - look_back):
X.append(data.iloc[i:i + look_back].values)
y.append(data.iloc[i + look_back])
return np.array(X), np.array(y)
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))
# Create datasets
look_back = 6 # Use the last 24 hours to predict the next hour
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)
# Reshape input to be [samples, time steps, features] for LSTM
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_val = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
# 3. Build the LSTM model
def build_lstm_model(input_shape):
"""
Build an LSTM model.
Parameters:
input_shape (tuple): Shape of the input data (time steps, features).
Returns:
model: Compiled LSTM model.
"""
model = Sequential()
model.add(LSTM(50, input_shape=input_shape, activation='relu')) # LSTM layer
model.add(Dense(1)) # Output layer
model.compile(optimizer='adam', loss='mse')
return model
# Build and train the model
model = build_lstm_model((X_train.shape[1], X_train.shape[2]))
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=1)
# 4. Forecast for validation and test
def forecast_lstm(model, data, look_back):
"""
Forecast future values using the LSTM model.
Parameters:
model: Trained LSTM model.
data (np.array): Input data.
look_back (int): Number of previous time steps to use as input.
Returns:
forecast: Forecasted values.
"""
forecast = []
for i in range(len(data) - look_back):
X = data[i:i + look_back].reshape(1, look_back, 1)
y_pred = model.predict(X, verbose=0)
forecast.append(y_pred[0, 0])
return np.array(forecast)
# Forecast for validation
val_forecast_scaled = forecast_lstm(model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_forecast = pd.Series(val_forecast, index=validation.index[look_back:])
# Forecast for test
test_forecast_scaled = forecast_lstm(model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_forecast = pd.Series(test_forecast, index=test.index[look_back:])
# 5. Calculate metrics
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
return metrics
# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_forecast)
val_residuals = validation.iloc[look_back:] - val_forecast
# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
val_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_forecast)
test_residuals = test.iloc[look_back:] - test_forecast
# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
test_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)
# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')
# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 6. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(val_forecast.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test_forecast.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'LSTM Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
plot_acf(val_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 8. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
plot_acf(test_residuals_cleaned, lags=min(20, len(test_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: LSTM
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[7] = {"ModelName": "LSTM", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[7] = {"ModelName": "LSTM", "ResValues": tempSeries.values}
Epoch 1/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 22s 4ms/step - loss: 0.0093 - val_loss: 6.4560e-04 Epoch 2/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 20s 4ms/step - loss: 0.0011 - val_loss: 6.1571e-04 Epoch 3/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 20s 4ms/step - loss: 0.0011 - val_loss: 5.4599e-04 Epoch 4/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 21s 4ms/step - loss: 0.0010 - val_loss: 5.8509e-04 Epoch 5/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 42s 4ms/step - loss: 0.0010 - val_loss: 6.0400e-04 Epoch 6/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 42s 4ms/step - loss: 0.0010 - val_loss: 7.4651e-04 Epoch 7/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 39s 4ms/step - loss: 0.0010 - val_loss: 8.3446e-04 Epoch 8/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 44s 4ms/step - loss: 9.9106e-04 - val_loss: 6.4935e-04 Epoch 9/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 23s 4ms/step - loss: 9.9718e-04 - val_loss: 6.1355e-04 Epoch 10/10 5375/5375 ━━━━━━━━━━━━━━━━━━━━ 22s 4ms/step - loss: 9.7365e-04 - val_loss: 5.7034e-04 Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 3.0971 | 0.6505 | 0.7809 | 0.6099 | 0.2904 | 0.5312 | 0.5811 |
| Test | 2.7527 | 0.5637 | 0.6992 | 0.4888 | 0.5200 | 0.8347 | 0.4878 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Random | Normal |
| Test | Random | Normal |
El código implementa un modelo ARIMA para predecir una serie de tiempo con datos meteorológicos obtenidos en la estación A601 del sudoeste de Brasil, dividiendo los datos en conjuntos de entrenamiento, validación y prueba.
Preparación de datos:
La serie temporal se divide en tres conjuntos: entrenamiento, validación y prueba.
Se ajusta la frecuencia de la serie temporal a horas ('H').
Modelado ARIMA:
Se define y ajusta un modelo ARIMA con parámetros (p, d, q) = (1, 1, 1) a los datos de entrenamiento.
Se realizan pronósticos para los conjuntos de validación y prueba.
Evaluación del modelo:
Se calculan métricas de error como MAPE, MAE, RMSE y MSE para evaluar la precisión de los pronósticos.
Se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos y verificar su aleatoriedad y normalidad.
Visualización:
Se grafican los datos de entrenamiento, validación, prueba y los pronósticos.
Se analizan y visualizan los residuos mediante histogramas, gráficos de densidad, Q-Q plots y ACF (Autocorrelación).
Resultados:
Se presentan tablas con las métricas de evaluación y la interpretación de las pruebas estadísticas.
En resumen, el código implementa un modelo ARIMA para predecir la temperatura en las próximas 24 horas basados en la serie de tiempo dada, evalúa su rendimiento mediante métricas y pruebas estadísticas, y visualiza los resultados y residuos para su análisis.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar12 - ARIMA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
# 0. Set the original data in data_ts
data_ts = data_ori
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Set the frequency of the datetime index
data_ts = data_ts.asfreq('H')
# Test: Last 24 values
test = data_ts.iloc[-test_size:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Build and fit the ARIMA model
def fit_arima_model(data, order):
"""
Fit an ARIMA model to the data.
Parameters:
data (pd.Series): Input time series.
order (tuple): (p, d, q) parameters of the ARIMA model.
Returns:
model: Fitted ARIMA model.
"""
model = ARIMA(data, order=order)
return model.fit()
# Fit ARIMA model to training data
arima_order = (1, 1, 1) # Adjust the (p, d, q) order as needed
arima_model = fit_arima_model(train, arima_order)
# 3. Forecast
def forecast_arima(model, steps):
"""
Forecast future values using the ARIMA model.
Parameters:
model: Fitted ARIMA model.
steps (int): Number of steps to forecast.
Returns:
forecast: Forecasted values as a Pandas Series.
"""
forecast = model.forecast(steps=steps)
return forecast
# Forecast for validation and test
val_forecast = forecast_arima(arima_model, len(validation))
combined_train = pd.concat([train, validation])
arima_model = fit_arima_model(combined_train, arima_order)
#test_forecast = forecast_arima(arima_model, len(test))
test_forecast = forecast_arima(arima_model, len(test))
# 4. Calculate metrics
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
return metrics
# Validation metrics
val_metrics = calculate_metrics(validation, val_forecast)
val_residuals = validation - val_forecast
# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
val_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Test metrics
test_metrics = calculate_metrics(test, test_forecast)
test_residuals = test - test_forecast
# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
test_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)
# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')
# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 5. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(validation.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'ARIMA Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 6. Residual Analysis: Validation
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].grid(True, alpha=0.3)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', ax=axes[0, 1])
if val_residuals.notna().sum() > 1: # Ensure there are at least two valid residuals
val_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
else:
print("Not enough data points for KDE plot.")
# val_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
plot_acf(val_residuals_cleaned, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes[0, 0].plot(test_residuals.index, test_residuals, label='Validation Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].grid(True, alpha=0.3)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', ax=axes[0, 1])
if test_residuals.notna().sum() > 1: # Ensure there are at least two valid residuals
test_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
else:
print("Not enough data points for KDE plot.")
# test_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
plot_acf(test_residuals_cleaned, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
plt.tight_layout()
plt.show()
# Add the model: ARIMA
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[8] = {"ModelName": "ARIMA", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[8] = {"ModelName": "ARIMA", "ResValues": tempSeries.values}
Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 6.6364 | 1.4075 | 1.7045 | 2.9054 | 0.0 | 0.0004 | 0.1055 |
| Test | 12.8064 | 2.9088 | 3.8886 | 15.1211 | 0.0 | 0.0001 | 0.0000 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Not Random | Normal |
| Test | Not Random | Not Normal |
El código implementa un modelo ARIMA-Rolling para predecir una serie de tiempo con datos meteorológicos obtenidos en la estación A601 del sudoeste de Brasil, dividiendo los datos en conjuntos de entrenamiento, validación y prueba.
Preparación de datos:
La serie temporal se divide en tres conjuntos: entrenamiento, validación y prueba.
Se ajusta la frecuencia de la serie temporal a horas ('H').
Modelado ARIMA:
Se define y ajusta un modelo ARIMA con parámetros (p, d, q) = (1, 1, 1) a los datos de entrenamiento.
Se realizan pronósticos para los conjuntos de validación y prueba.
Evaluación del modelo:
Se calculan métricas de error como MAPE, MAE, RMSE y MSE para evaluar la precisión de los pronósticos.
Se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos y verificar su aleatoriedad y normalidad.
Visualización:
Se grafican los datos de entrenamiento, validación, prueba y los pronósticos.
Se analizan y visualizan los residuos mediante histogramas, gráficos de densidad, Q-Q plots y ACF (Autocorrelación).
Resultados:
Se presentan tablas con las métricas de evaluación y la interpretación de las pruebas estadísticas.
En resumen, el código implementa un modelo ARIMA-Rolling para predecir la temperatura en las próximas 24 horas basados en la serie de tiempo dada, evalúa su rendimiento mediante métricas y pruebas estadísticas, y visualiza los resultados y residuos para su análisis.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - ARIMA-Rolling
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_absolute_error, mean_squared_error
# 0. Set the original data in data_ts
data_ts = data_ori
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Set the frequency of the datetime index
data_ts = data_ts.asfreq('H')
# Test: Last 24 values
test = data_ts.iloc[-test_size:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size):-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Build and fit the ARIMA model
def fit_arima_model(data, order):
"""
Fit an ARIMA model to the data.
Parameters:
data (pd.Series): Input time series.
order (tuple): (p, d, q) parameters of the ARIMA model.
Returns:
model: Fitted ARIMA model.
"""
model = ARIMA(data, order=order)
return model.fit()
# 3. Rolling Forecast
def rolling_forecast_arima(train, test, order):
"""
Perform rolling forecasts using the ARIMA model.
Parameters:
train (pd.Series): Training data.
test (pd.Series): Test data.
order (tuple): (p, d, q) parameters of the ARIMA model.
Returns:
forecast: Forecasted values as a Pandas Series.
"""
history = train.copy()
forecast = []
for i in range(len(test)):
# Fit the ARIMA model
model = fit_arima_model(history, order)
# Forecast one step ahead
yhat = model.forecast(steps=1)
forecast.append(yhat.iloc[0])
# Append the actual test value to history
history = pd.concat([history, test.iloc[i:i+1]])
return pd.Series(forecast, index=test.index)
# ARIMA order
arima_order = (1, 1, 1) # Adjust the (p, d, q) order as needed
# Rolling forecast for validation
val_forecast = rolling_forecast_arima(train, validation, arima_order)
# Rolling forecast for test
combined_train = pd.concat([train, validation])
test_forecast = rolling_forecast_arima(combined_train, test, arima_order)
# 4. Calculate metrics
def calculate_metrics(actual, forecast):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
return metrics
# Validation metrics
val_metrics = calculate_metrics(validation, val_forecast)
val_residuals = validation - val_forecast
# Drop NaN values from residuals before Ljung-Box test
val_residuals_cleaned = val_residuals.dropna()
if len(val_residuals_cleaned) > 0:
val_metrics['Ljung-Box p-value'] = acorr_ljungbox(val_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
val_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Test metrics
test_metrics = calculate_metrics(test, test_forecast)
test_residuals = test - test_forecast
# Drop NaN values from residuals before Ljung-Box test
test_residuals_cleaned = test_residuals.dropna()
if len(test_residuals_cleaned) > 0:
test_metrics['Ljung-Box p-value'] = acorr_ljungbox(test_residuals_cleaned, lags=10, return_df=True)['lb_pvalue'].values[0]
else:
test_metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)
# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')
# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 5. Plot Training (last 48 observations), Validation, Test, and Forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(validation.index, val_forecast, label='Validation Forecast', color='orange', linestyle='--')
plt.plot(test.index, test_forecast, label='Test Forecast', color='purple', linestyle='--')
plt.title(f'ARIMA-Rolling Forecast\nValidation MAE: {val_metrics["MAE"]:.2f}, Test MAE: {test_metrics["MAE"]:.2f}')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 6. Residual Analysis: Validation
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].grid(True, alpha=0.3)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', ax=axes[0, 1])
if val_residuals.notna().sum() > 1: # Ensure there are at least two valid residuals
val_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
else:
print("Not enough data points for KDE plot.")
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
plot_acf(val_residuals_cleaned, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes[0, 0].plot(test_residuals.index, test_residuals, label='Validation Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
# axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].grid(True, alpha=0.3)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', ax=axes[0, 1])
if test_residuals.notna().sum() > 1: # Ensure there are at least two valid residuals
test_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
else:
print("Not enough data points for KDE plot.")
# test_residuals.plot(kind='kde', label='Validation Residuals Density', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
plot_acf(test_residuals_cleaned, lags=20, ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
plt.tight_layout()
plt.show()
# Add the model: ARIMA-Rolling
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[9] = {"ModelName": "ARIMA-Rolling", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[9] = {"ModelName": "ARIMA-Rolling", "ResValues": tempSeries.values}
Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 3.0830 | 0.6450 | 0.7942 | 0.6308 | 0.7384 | 0.8062 | 0.7772 |
| Test | 2.9237 | 0.5964 | 0.7443 | 0.5540 | 0.1419 | 0.9429 | 0.5619 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Random | Normal |
| Test | Random | Normal |
El código implementa un modelo de Support Vector Regression (SVR) para predecir la temperatura de las siguientes 24 horas con base en los datos de la serie de tiempo de obtenidos en la estación A601 del sudoeste de Brasil. Las librearías utilizadas son: statsmodels y sklearn.
A continuación, se resume lo más relevante:
Preparación de datos:
La serie de tiempo se divide en conjuntos de entrenamiento, validación y prueba.
Se rellenan los valores faltantes (NaN) con el método ffill.
Los datos se escalan utilizando MinMaxScaler para normalizarlos en el rango [0, 1].
Construcción del modelo SVR:
Se utiliza GridSearchCV para encontrar los mejores hiperparámetros del modelo SVR (C, epsilon, kernel).
Se entrena el modelo SVR con los datos de entrenamiento.
Pronóstico:
Se realizan pronósticos para los conjuntos de validación y prueba utilizando el modelo SVR entrenado.
Los pronósticos se transforman de nuevo a la escala original.
Evaluación del modelo:
Se calculan métricas de error como MAPE, MAE, RMSE y MSE.
Se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos y verificar su aleatoriedad y normalidad.
Visualización:
Se grafican los datos de entrenamiento, validación, prueba y los pronósticos.
Se analizan y visualizan los residuos mediante histogramas, gráficos de densidad, Q-Q plots y ACF (Autocorrelación).
Resultados:
Se presentan tablas con las métricas de evaluación y la interpretación de las pruebas estadísticas.
En resumen, el código implementa un modelo SVR para predecir una serie temporal, evalúa su rendimiento mediante métricas y pruebas estadísticas, y visualiza los resultados y residuos para su análisis.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar09 - SVR (without AFS)
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.tsaplots import plot_acf # Add this line
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from statsmodels.sandbox.stats.runs import runstest_1samp
# 0. Fix NaN values and prepare the dataset
data_ts = data_ori.fillna(method='ffill')[-1000:]
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
# Test: Last 24 values
test = data_ts.iloc[-test_size-6:]
# Validation: 24 values before test
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]
# Training: All other records
train = data_ts.iloc[:-(test_size + val_size)]
# 2. Prepare data for the SVR model
def create_dataset(data, look_back=6):
"""
Create sequences of data for SVR input.
"""
X, y = [], []
for i in range(len(data) - look_back):
X.append(data.iloc[i:i + look_back].values)
y.append(data.iloc[i + look_back])
return np.array(X), np.array(y)
# Scale the data
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))
# Create datasets
look_back = 6
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)
# 3. Build and train the SVR model
def build_svr_model(X_train, y_train):
param_grid = {
'C': [0.1, 1, 10],
'epsilon': [0.01, 0.1, 0.2],
'kernel': ['rbf', 'linear']
}
model = GridSearchCV(SVR(), param_grid, cv=3, scoring='neg_mean_squared_error')
model.fit(X_train, y_train)
print("Best SVR parameters:", model.best_params_)
return model.best_estimator_
svr_model = build_svr_model(X_train, y_train)
# Forecast for validation and test
def forecast_svr(model, data, look_back):
forecast = []
for i in range(len(data) - look_back):
X = data[i:i + look_back].reshape(1, look_back)
y_pred = model.predict(X)
forecast.append(y_pred[0])
return np.array(forecast)
# Forecast for validation
val_forecast_scaled = forecast_svr(svr_model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_residuals = validation.iloc[look_back:] - val_forecast
# Forecast for test
test_forecast_scaled = forecast_svr(svr_model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_residuals = test.iloc[look_back:] - test_forecast
# 4. Calculate metrics
def calculate_metrics(actual, forecast, residuals):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
# Calculate Ljung-Box p-value
residuals_cleaned = residuals.dropna()
if len(residuals_cleaned) > 0:
ljungbox_result = acorr_ljungbox(residuals_cleaned, lags=10, return_df=True)
metrics['Ljung-Box p-value'] = ljungbox_result['lb_pvalue'].values[0]
else:
metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
return metrics
# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_forecast, val_residuals)
# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_forecast, test_residuals)
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals)
# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals, 'norm')
# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 5. Plot final forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(validation.index[look_back:], val_forecast, label='Validation SVR Forecast', color='orange', linestyle='--')
plt.plot(test.index[look_back:], test_forecast, label='Test SVR Forecast', color='purple', linestyle='--')
plt.title('SVR Forecast')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 6. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
plot_acf(val_residuals, lags=min(20, len(val_residuals) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
plot_acf(test_residuals, lags=min(20, len(test_residuals) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: SVR
tempSeries = pd.Series(val_residuals)
ResTableVal.loc[10] = {"ModelName": "SVR", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals)
ResTableTest.loc[10] = {"ModelName": "SVR", "ResValues": tempSeries.values}
Best SVR parameters: {'C': 10, 'epsilon': 0.01, 'kernel': 'linear'}
Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 3.2721 | 0.6778 | 0.8168 | 0.6672 | 0.6946 | 0.8062 | 0.1562 |
| Test | 3.1501 | 0.6364 | 0.7637 | 0.5832 | 0.2419 | 0.8609 | 0.0975 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Random | Normal |
| Test | Random | Normal |
A continuación el código Python para generar el modelo SVR-AFS
El código implementa un modelo SVR (Support Vector Regression) mejorado con Adaptive Fourier Series (AFS) para predecir la temperatura en las siguientes 24 horas a partir de la serie de tiempo que contiene los datos meteorológicos de la estación A601 del sudoeste de Brasil. A continuación, se resume lo más relevante:
Preparación de datos:
La serie temporal se divide en conjuntos de entrenamiento, validación y prueba.
Se rellenan los valores faltantes (NaN) con el método ffill.
Los datos se escalan utilizando MinMaxScaler para normalizarlos en el rango [0, 1].
Modelado SVR inicial:
Se entrena un modelo SVR utilizando GridSearchCV para encontrar los mejores hiperparámetros (C, epsilon, kernel).
Se realizan pronósticos para los conjuntos de validación y prueba.
Mejora con Adaptive Fourier Series (AFS):
Se calculan los residuos del modelo SVR inicial.
Se generan características de Fourier (componentes seno y coseno) a partir de los residuos.
Se entrena un segundo modelo SVR sobre estas características para corregir los residuos.
Pronóstico corregido:
Se ajustan los pronósticos iniciales utilizando las correcciones de residuos obtenidas del segundo modelo SVR.
Evaluación del modelo:
Se calculan métricas de error como MAPE, MAE, RMSE y MSE.
Se realizan pruebas estadísticas (Ljung-Box, Wald-Wolfowitz y Kolmogorov-Smirnov) para analizar los residuos y verificar su aleatoriedad y normalidad.
Visualización:
Se grafican los datos de entrenamiento, validación, prueba y los pronósticos corregidos.
Se analizan y visualizan los residuos mediante histogramas, gráficos de densidad, Q-Q plots y ACF (Autocorrelación).
Resultados:
Se presentan tablas con las métricas de evaluación y la interpretación de las pruebas estadísticas.
En resumen, el código implementa un modelo SVR mejorado con AFS para predecir una serie temporal, corrige los residuos utilizando características de Fourier, evalúa su rendimiento mediante métricas y pruebas estadísticas, y visualiza los resultados y residuos para su análisis.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
Los resultados de las métricas y la interpretación de los resultados de normalidad y aleatoriedad se presentan en tablas independientes.
# JDF-2025Mar10- SVR-AFS
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import acorr_ljungbox
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from statsmodels.sandbox.stats.runs import runstest_1samp
# Adaptive Fourier Series (AFS) for Residuals
def adaptive_fourier_series(data, n_components=3):
"""
Generate Adaptive Fourier Series (AFS) features for a given series.
Parameters:
data (pd.Series): Input time series (residuals).
n_components (int): Number of Fourier components to use.
Returns:
pd.DataFrame: DataFrame with AFS features (cosine and sine components).
"""
t = np.linspace(0, 1, len(data), endpoint=False)
afs_features = pd.DataFrame(index=data.index)
for n in range(1, n_components + 1):
afs_features[f'cos_{n}'] = np.cos(2 * np.pi * n * t * data.values)
afs_features[f'sin_{n}'] = np.sin(2 * np.pi * n * t * data.values)
return afs_features
# 0. Fix NaN values and prepare the dataset
data_ts = data_ori.fillna(method='ffill')[-1000:]
# 1. Split the series into training, validation, and test sets
test_size = 24
val_size = 24
test = data_ts.iloc[-test_size-6:]
validation = data_ts.iloc[-(test_size + val_size)-6:-test_size]
train = data_ts.iloc[:-(test_size + val_size)]
train2 = train = data_ts.iloc[:-test_size]
# 2. Prepare data for the initial SVR model
def create_dataset(data, look_back=6):
"""
Create sequences of data for SVR input.
"""
X, y = [], []
for i in range(len(data) - look_back):
X.append(data.iloc[i:i + look_back].values)
y.append(data.iloc[i + look_back])
return np.array(X), np.array(y)
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train.values.reshape(-1, 1))
validation_scaled = scaler.transform(validation.values.reshape(-1, 1))
test_scaled = scaler.transform(test.values.reshape(-1, 1))
look_back = 6
X_train, y_train = create_dataset(pd.Series(train_scaled.flatten()), look_back)
X_val, y_val = create_dataset(pd.Series(validation_scaled.flatten()), look_back)
X_test, y_test = create_dataset(pd.Series(test_scaled.flatten()), look_back)
# 3. Build and train the initial SVR model
def build_svr_model(X_train, y_train):
param_grid = {
'C': [0.1, 1, 10],
'epsilon': [0.01, 0.1, 0.2],
'kernel': ['rbf', 'linear']
}
model = GridSearchCV(SVR(), param_grid, cv=3, scoring='neg_mean_squared_error')
model.fit(X_train, y_train)
print("Best SVR parameters:", model.best_params_)
return model.best_estimator_
svr_model = build_svr_model(X_train, y_train)
# Forecast for validation and test
def forecast_svr(model, data, look_back):
forecast = []
for i in range(len(data) - look_back):
X = data[i:i + look_back].reshape(1, look_back)
y_pred = model.predict(X)
forecast.append(y_pred[0])
return np.array(forecast)
val_forecast_scaled = forecast_svr(svr_model, validation_scaled, look_back)
val_forecast = scaler.inverse_transform(val_forecast_scaled.reshape(-1, 1)).flatten()
val_residuals = validation.iloc[look_back:] - val_forecast
test_forecast_scaled = forecast_svr(svr_model, test_scaled, look_back)
test_forecast = scaler.inverse_transform(test_forecast_scaled.reshape(-1, 1)).flatten()
test_residuals = test.iloc[look_back:] - test_forecast
# 4. Add AFS features for residuals and train a second SVR-AFS model
n_components = 3
val_residuals_features = adaptive_fourier_series(val_residuals, n_components)
test_residuals_features = adaptive_fourier_series(test_residuals, n_components)
# Train SVR on residuals
X_residuals = val_residuals_features.values
y_residuals = val_residuals.values
svr_residual_model = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr_residual_model.fit(X_residuals, y_residuals)
# Correct the forecasts using the residual SVR model
val_residual_corrections = svr_residual_model.predict(val_residuals_features)
val_corrected_forecast = val_forecast + val_residual_corrections
test_residual_corrections = svr_residual_model.predict(test_residuals_features)
test_corrected_forecast = test_forecast + test_residual_corrections
# JDF - 2025Mar09 Empieza prueba
val_residuals_def = validation.iloc[look_back:] - val_corrected_forecast
val_residuals_def = validation.iloc[look_back:] - val_corrected_forecast
#test_residuals_def = validation.iloc[look_back:] - test_corrected_forecast
test_residuals_def = test.iloc[look_back:] - test_corrected_forecast
# 5. Calculate metrics
def calculate_metrics(actual, forecast, residuals):
"""
Calculate MAPE, MAE, RMSE, MSE, and Ljung-Box p-value.
"""
actual = actual.replace(0, 1e-9) # Avoid division by zero
metrics = {
'MAPE': np.mean(np.abs((actual - forecast) / actual)) * 100,
'MAE': mean_absolute_error(actual, forecast),
'RMSE': np.sqrt(mean_squared_error(actual, forecast)),
'MSE': mean_squared_error(actual, forecast),
}
# Calculate Ljung-Box p-value
residuals_cleaned = residuals.dropna()
if len(residuals_cleaned) > 0:
ljungbox_result = acorr_ljungbox(residuals_cleaned, lags=10, return_df=True)
metrics['Ljung-Box p-value'] = ljungbox_result['lb_pvalue'].values[0]
else:
metrics['Ljung-Box p-value'] = np.nan # Handle case where all residuals are NaN
return metrics
# Validation metrics
val_metrics = calculate_metrics(validation.iloc[look_back:], val_corrected_forecast, val_residuals_def)
# Test metrics
test_metrics = calculate_metrics(test.iloc[look_back:], test_corrected_forecast, test_residuals_def)
# Perform Wald-Wolfowitz Randomness Test
val_runs_stat, val_runs_pvalue = runstest_1samp(val_residuals_def)
test_runs_stat, test_runs_pvalue = runstest_1samp(test_residuals_def)
# Add randomness test results to metrics
val_metrics['Randomness p-value'] = val_runs_pvalue
test_metrics['Randomness p-value'] = test_runs_pvalue
# Perform Kolmogorov-Smirnov Normality Test
val_ks_stat, val_ks_pvalue = stats.kstest(val_residuals_def, 'norm')
test_ks_stat, test_ks_pvalue = stats.kstest(test_residuals_def, 'norm')
# Add normality test results to metrics
val_metrics['KS-Normality p-value'] = val_ks_pvalue
test_metrics['KS-Normality p-value'] = test_ks_pvalue
# Create a DataFrame for metrics
metrics_table = pd.DataFrame([val_metrics, test_metrics], index=['Validation', 'Test'])
# Create a separate table for interpretation
interpretation_table = pd.DataFrame({
'Randomness': ['Random' if val_runs_pvalue > 0.05 else 'Not Random', 'Random' if test_runs_pvalue > 0.05 else 'Not Random'],
'KS-Normality': ['Normal' if val_ks_pvalue > 0.05 else 'Not Normal', 'Normal' if test_ks_pvalue > 0.05 else 'Not Normal']
}, index=['Validation', 'Test'])
# Print the tables
print("Metrics Table:")
display(metrics_table.round(4)) # Use display() to render the DataFrame
print("\nTest Interpretation Table:")
display(interpretation_table.round(4)) # Use display() to render the DataFrame
# 6. Plot final corrected forecasts
plt.figure(figsize=(14, 6))
plt.plot(train.iloc[-48:].index, train.iloc[-48:], label='Training (Last 48)', color='blue')
plt.plot(validation.index, validation, label='Validation', color='green')
plt.plot(test.index, test, label='Test', color='red')
plt.plot(validation.index[look_back:], val_corrected_forecast, label='Validation SVR-AFS Forecast', color='orange', linestyle='--')
plt.plot(test.index[look_back:], test_corrected_forecast, label='Test SVR-AFS Forecast', color='purple', linestyle='--')
plt.title('SVR-AFS Forecast with Residual Corrections')
plt.xlabel('Data_Hora')
plt.ylabel('Temperature (°C)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 7. Residual Analysis: Validation (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Validation)
axes[0, 0].plot(val_residuals.index, val_residuals, label='Validation Residuals', color='blue')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Validation Residuals vs Time')
axes[0, 0].set_title(f'Validation Residuals vs Time\nRandomness: {"Random" if val_runs_pvalue > 0.05 else "Not Random"} ({val_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Validation)
val_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Validation Residuals Histogram', color='blue', ax=axes[0, 1])
val_residuals.plot(kind='kde', label='Validation Residuals Density', color='blue', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Validation Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Validation)
stats.probplot(val_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Validation Residuals Q-Q Plot')
axes[1, 0].set_title(f'Validation Residuals Q-Q Plot\nKS-Normality: {"Normal" if val_ks_pvalue > 0.05 else "Not Normal"} ({val_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Validation)
#plot_acf(val_residuals_cleaned, lags=min(20, len(val_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
plot_acf(val_residuals_def, lags=min(20, len(val_residuals_def) - 1), ax=axes[1, 1], title='ACF of Validation Residuals', color='blue')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 8. Residual Analysis: Test (2x2 Grid)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Quadrant 0,0: Residuals vs Time (Test)
axes[0, 0].plot(test_residuals.index, test_residuals, label='Test Residuals', color='red')
axes[0, 0].axhline(0, color='black', linestyle='--')
#axes[0, 0].set_title('Test Residuals vs Time')
axes[0, 0].set_title(f'Test Residuals vs Time\nRandomness: {"Random" if test_runs_pvalue > 0.05 else "Not Random"} ({test_runs_pvalue:.4f})')
axes[0, 0].set_xlabel('Data_Hora')
axes[0, 0].set_ylabel('Residuals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Quadrant 0,1: Residuals Density and Histogram (Test)
test_residuals.plot(kind='hist', bins=20, density=True, alpha=0.5, label='Test Residuals Histogram', color='red', ax=axes[0, 1])
test_residuals.plot(kind='kde', label='Test Residuals Density', color='red', linestyle='--', ax=axes[0, 1])
axes[0, 1].set_title('Test Residuals Histogram and Density')
axes[0, 1].set_xlabel('Residuals')
axes[0, 1].set_ylabel('Density')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Quadrant 1,0: Q-Q Plot (Test)
stats.probplot(test_residuals, dist="norm", plot=axes[1, 0])
#axes[1, 0].set_title('Test Residuals Q-Q Plot')
axes[1, 0].set_title(f'Test Residuals Q-Q Plot\nKS-Normality: {"Normal" if test_ks_pvalue > 0.05 else "Not Normal"} ({test_ks_pvalue:.4f})')
axes[1, 0].grid(True, alpha=0.3)
# Quadrant 1,1: ACF (Test)
#plot_acf(test_residuals_cleaned, lags=min(20, len(test_residuals_cleaned) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
plot_acf(test_residuals_def, lags=min(20, len(test_residuals_def) - 1), ax=axes[1, 1], title='ACF of Test Residuals', color='red')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Add the model: SVR-AFS
tempSeries = pd.Series(val_residuals_def)
ResTableVal.loc[11] = {"ModelName": "SVR-AFS", "ResValues": tempSeries.values}
tempSeries = pd.Series(test_residuals_def)
ResTableTest.loc[11] = {"ModelName": "SVR-AFS", "ResValues": tempSeries.values}
Best SVR parameters: {'C': 10, 'epsilon': 0.01, 'kernel': 'linear'}
Metrics Table:
| MAPE | MAE | RMSE | MSE | Ljung-Box p-value | Randomness p-value | KS-Normality p-value | |
|---|---|---|---|---|---|---|---|
| Validation | 2.1869 | 0.4546 | 0.6653 | 0.4426 | 0.4072 | 0.0577 | 0.0933 |
| Test | 2.5641 | 0.5322 | 0.6960 | 0.4844 | 0.4042 | 0.9429 | 0.1784 |
Test Interpretation Table:
| Randomness | KS-Normality | |
|---|---|---|
| Validation | Random | Normal |
| Test | Random | Normal |
El siguiente código implementa la prueba de Diebold-Mariano para comparar la precisión de dos modelos de pronóstico para los residuos obtenidos en los modelos en el conjunto de Validación. A continuación, se resume lo más relevante:
Prueba de Diebold-Mariano:
La función diebold_mariano_test compara dos modelos de pronóstico utilizando una función de pérdida (error cuadrático o error absoluto).
Calcula el estadístico DM y el valor p para determinar si hay una diferencia significativa en la precisión de los pronósticos.
Comparación de modelos:
Se itera sobre todos los pares únicos de modelos almacenados en ResTableVal.
Para cada par de modelos, se obtienen los residuos (errores de pronóstico) y se realiza la prueba de Diebold-Mariano.
Resultados:
Los resultados de la prueba (estadístico DM y valor p) se almacenan en un DataFrame (ResDM).
Finalmente, se imprime el DataFrame con los resultados de todas las comparaciones.
En resumen, el código compara la precisión de diferentes modelos de pronóstico utilizando la prueba de Diebold-Mariano, almacena los resultados en un DataFrame y los imprime para su análisis
# JDF - 2025Mar11 - Diebold-Mariano - Validación
import pandas as pd
import numpy as np
from scipy.stats import norm
def diebold_mariano_test(actual, forecast1, forecast2, loss_function='squared'):
"""
Performs the Diebold-Mariano test to compare two forecasts.
Args:
actual (np.array): Array of actual values.
forecast1 (np.array): Array of forecast values from model 1.
forecast2 (np.array): Array of forecast values from model 2.
loss_function (str): 'squared' for squared error, 'absolute' for absolute error.
Returns:
float: DM statistic.
float: p-value.
"""
if len(actual) != len(forecast1) or len(actual) != len(forecast2):
raise ValueError("Arrays must have the same length.")
if loss_function == 'squared':
d = (actual - forecast1)**2 - (actual - forecast2)**2
elif loss_function == 'absolute':
d = np.abs(actual - forecast1) - np.abs(actual - forecast2)
else:
raise ValueError("Invalid loss function. Choose 'squared' or 'absolute'.")
mean_d = np.mean(d)
var_d = np.var(d, ddof=1) # Use ddof=1 for sample variance
n = len(d)
if var_d == 0: #handle the case where all loss differentials are identical.
return 0,1.0 #no difference, p value 1.0
dm_stat = mean_d / np.sqrt(var_d / n)
p_value = 2 * (1 - norm.cdf(abs(dm_stat)))
return dm_stat, p_value
# Initialize an empty dataframe to store DM test results
ResDM = pd.DataFrame(columns=['ModelName1', 'ModelName2', 'DMStatistic', 'DMpValue'])
# Get the list of unique models
models = ResTableVal['ModelName'].unique()
# Create a zero array for actuals (since residuals are used as forecasts)
#actual_values = np.zeros(len(ResTableVal['ResValues'].iloc[0]))
#len(ResTableVal['ResValues'])
# Iterate over all unique pairs of models
for i in range(len(models)):
for j in range(i + 1, len(models)):
model1 = models[i]
model2 = models[j]
# Get residuals for the two models
res1 = ResTableVal[ResTableVal['ModelName'] == model1]['ResValues'].iloc[0]
res2 = ResTableVal[ResTableVal['ModelName'] == model2]['ResValues'].iloc[0]
actual_values = np.zeros(len(res1))
# Perform Diebold-Mariano test
dm_result = diebold_mariano_test(actual_values, res1, res2, loss_function='squared')
# Extract DM statistic and p-value
dm_statistic = dm_result[0]
dm_pvalue = dm_result[1]
# Append the result to the ResDM dataframe using pd.concat
new_row = pd.DataFrame({
'ModelName1': [model1],
'ModelName2': [model2],
'DMStatistic': [dm_statistic],
'DMpValue': [dm_pvalue]
})
ResDM = pd.concat([ResDM, new_row], ignore_index=True)
# Print the resulting dataframe
print(ResDM)
ModelName1 ModelName2 DMStatistic DMpValue 0 SES DES -1.095432 0.273327 1 SES SES-HW -3.300037 0.000967 2 SES DES-HW -1.659247 0.097066 3 SES TES-HW -1.276185 0.201890 4 SES RNN 1.755112 0.079240 .. ... ... ... ... 61 ARIMA SVR 2.939122 0.003291 62 ARIMA SVR-AFS 3.264993 0.001095 63 ARIMA-Rolling SVR -0.401083 0.688359 64 ARIMA-Rolling SVR-AFS 2.148632 0.031664 65 SVR SVR-AFS 3.385631 0.000710 [66 rows x 4 columns]
El código realiza un ranking de modelos basado en los resultados de la prueba de Diebold-Mariano para los residuos obtenidos en los modelos en el conjunto de Validación. A continuación, se resume lo más relevante:
Evaluación de modelos:
Utiliza los resultados almacenados en ResDM, que contiene los estadísticos DM y los valores p de las comparaciones entre pares de modelos.
Asigna puntajes a cada modelo según si supera significativamente a otro en la prueba de Diebold-Mariano.
Criterio de significancia:
Si el valor p es menor que el nivel de significancia (0.05), se determina que hay una diferencia significativa entre los modelos.
El modelo con mejor desempeño (según el estadístico DM) recibe un punto adicional en su puntaje.
Ranking de modelos:
Los modelos se ordenan de mayor a menor según su puntaje acumulado.
Se imprime el ranking de modelos, indicando su posición y puntaje.
En resumen, el código clasifica los modelos según su desempeño relativo en la prueba de Diebold-Mariano, generando un ranking de mejor a peor modelo basado en su puntaje acumulado.
# JDF - 2025Mar11 - Diebold-Mariano
import pandas as pd
# Initialize a dictionary to store model scores
model_scores = {}
# Significance level
significance_level = 0.05
# Iterate through each row in ResDM
for index, row in ResDM.iterrows():
model1 = row['ModelName1']
model2 = row['ModelName2']
dm_statistic = row['DMStatistic']
dm_pvalue = row['DMpValue']
# Initialize scores for models if not already present
if model1 not in model_scores:
model_scores[model1] = 0
if model2 not in model_scores:
model_scores[model2] = 0
# Update scores based on DM results
if dm_pvalue < significance_level: # Statistically significant difference
if dm_statistic < 0: # Model1 is better
model_scores[model1] += 1
else: # Model2 is better
model_scores[model2] += 1
# Sort models by their scores in descending order
ranked_models = sorted(model_scores.keys(), key=lambda x: model_scores[x], reverse=True)
# Print the ranking
print("Model Ranking (Best to Worst):")
for rank, model in enumerate(ranked_models, start=1):
print(f"{rank}. {model} (Score: {model_scores[model]})")
Model Ranking (Best to Worst): 1. SVR-AFS (Score: 10) 2. RNN (Score: 5) 3. LSTM (Score: 5) 4. ARIMA-Rolling (Score: 5) 5. MLP (Score: 4) 6. SVR (Score: 4) 7. SES (Score: 2) 8. DES (Score: 0) 9. SES-HW (Score: 0) 10. DES-HW (Score: 0) 11. TES-HW (Score: 0) 12. ARIMA (Score: 0)
El siguiente código implementa la prueba de Diebold-Mariano para comparar la precisión de dos modelos de pronóstico para los residuos obtenidos en los modelos en el conjunto de Test. A continuación, se resume lo más relevante:
Prueba de Diebold-Mariano:
La función diebold_mariano_test compara dos modelos de pronóstico utilizando una función de pérdida (error cuadrático o error absoluto).
Calcula el estadístico DM y el valor p para determinar si hay una diferencia significativa en la precisión de los pronósticos.
Comparación de modelos:
Se itera sobre todos los pares únicos de modelos almacenados en ResTableVal.
Para cada par de modelos, se obtienen los residuos (errores de pronóstico) y se realiza la prueba de Diebold-Mariano.
Resultados:
Los resultados de la prueba (estadístico DM y valor p) se almacenan en un DataFrame (ResDM).
Finalmente, se imprime el DataFrame con los resultados de todas las comparaciones.
En resumen, el código compara la precisión de diferentes modelos de pronóstico utilizando la prueba de Diebold-Mariano, almacena los resultados en un DataFrame y los imprime para su análisis
# JDF - 2025Mar11 - Diebold-Mariano - Test
import pandas as pd
import numpy as np
from scipy.stats import norm
def diebold_mariano_test(actual, forecast1, forecast2, loss_function='squared'):
"""
Performs the Diebold-Mariano test to compare two forecasts.
Args:
actual (np.array): Array of actual values.
forecast1 (np.array): Array of forecast values from model 1.
forecast2 (np.array): Array of forecast values from model 2.
loss_function (str): 'squared' for squared error, 'absolute' for absolute error.
Returns:
float: DM statistic.
float: p-value.
"""
if len(actual) != len(forecast1) or len(actual) != len(forecast2):
raise ValueError("Arrays must have the same length.")
if loss_function == 'squared':
d = (actual - forecast1)**2 - (actual - forecast2)**2
elif loss_function == 'absolute':
d = np.abs(actual - forecast1) - np.abs(actual - forecast2)
else:
raise ValueError("Invalid loss function. Choose 'squared' or 'absolute'.")
mean_d = np.mean(d)
var_d = np.var(d, ddof=1) # Use ddof=1 for sample variance
n = len(d)
if var_d == 0: #handle the case where all loss differentials are identical.
return 0,1.0 #no difference, p value 1.0
dm_stat = mean_d / np.sqrt(var_d / n)
p_value = 2 * (1 - norm.cdf(abs(dm_stat)))
return dm_stat, p_value
# Initialize an empty dataframe to store DM test results
ResDM = pd.DataFrame(columns=['ModelName1', 'ModelName2', 'DMStatistic', 'DMpValue'])
# Get the list of unique models
models = ResTableTest['ModelName'].unique()
# Create a zero array for actuals (since residuals are used as forecasts)
#actual_values = np.zeros(len(ResTableVal['ResValues'].iloc[0]))
#len(ResTableVal['ResValues'])
# Iterate over all unique pairs of models
for i in range(len(models)):
for j in range(i + 1, len(models)):
model1 = models[i]
model2 = models[j]
# Get residuals for the two models
res1 = ResTableTest[ResTableTest['ModelName'] == model1]['ResValues'].iloc[0]
res2 = ResTableTest[ResTableTest['ModelName'] == model2]['ResValues'].iloc[0]
actual_values = np.zeros(len(res1))
# Perform Diebold-Mariano test
dm_result = diebold_mariano_test(actual_values, res1, res2, loss_function='squared')
# Extract DM statistic and p-value
dm_statistic = dm_result[0]
dm_pvalue = dm_result[1]
# Append the result to the ResDM dataframe using pd.concat
new_row = pd.DataFrame({
'ModelName1': [model1],
'ModelName2': [model2],
'DMStatistic': [dm_statistic],
'DMpValue': [dm_pvalue]
})
ResDM = pd.concat([ResDM, new_row], ignore_index=True)
# Print the resulting dataframe
print(ResDM)
ModelName1 ModelName2 DMStatistic DMpValue 0 SES DES -0.938372 0.348053 1 SES SES-HW -2.966626 0.003011 2 SES DES-HW -1.296910 0.194662 3 SES TES-HW 0.871441 0.383514 4 SES RNN 2.307004 0.021055 .. ... ... ... ... 61 ARIMA SVR 3.859046 0.000114 62 ARIMA SVR-AFS 3.899375 0.000096 63 ARIMA-Rolling SVR -0.260840 0.794216 64 ARIMA-Rolling SVR-AFS 0.703635 0.481660 65 SVR SVR-AFS 0.797416 0.425210 [66 rows x 4 columns]
El código realiza un ranking de modelos basado en los resultados de la prueba de Diebold-Mariano para los residuos obtenidos en los modelos en el conjunto de Test. A continuación, se resume lo más relevante:
Evaluación de modelos:
Utiliza los resultados almacenados en ResDM, que contiene los estadísticos DM y los valores p de las comparaciones entre pares de modelos.
Asigna puntajes a cada modelo según si supera significativamente a otro en la prueba de Diebold-Mariano.
Criterio de significancia:
Si el valor p es menor que el nivel de significancia (0.05), se determina que hay una diferencia significativa entre los modelos.
El modelo con mejor desempeño (según el estadístico DM) recibe un punto adicional en su puntaje.
Ranking de modelos:
Los modelos se ordenan de mayor a menor según su puntaje acumulado.
Se imprime el ranking de modelos, indicando su posición y puntaje.
En resumen, el código clasifica los modelos según su desempeño relativo en la prueba de Diebold-Mariano, generando un ranking de mejor a peor modelo basado en su puntaje acumulado.
# JDF - 2025Mar11 - Diebold-Mariano - Test
import pandas as pd
# Initialize a dictionary to store model scores
model_scores = {}
# Significance level
significance_level = 0.05
# Iterate through each row in ResDM
for index, row in ResDM.iterrows():
model1 = row['ModelName1']
model2 = row['ModelName2']
dm_statistic = row['DMStatistic']
dm_pvalue = row['DMpValue']
# Initialize scores for models if not already present
if model1 not in model_scores:
model_scores[model1] = 0
if model2 not in model_scores:
model_scores[model2] = 0
# Update scores based on DM results
if dm_pvalue < significance_level: # Statistically significant difference
if dm_statistic < 0: # Model1 is better
model_scores[model1] += 1
else: # Model2 is better
model_scores[model2] += 1
# Sort models by their scores in descending order
ranked_models = sorted(model_scores.keys(), key=lambda x: model_scores[x], reverse=True)
# Print the ranking
print("Model Ranking (Best to Worst):")
for rank, model in enumerate(ranked_models, start=1):
print(f"{rank}. {model} (Score: {model_scores[model]})")
Model Ranking (Best to Worst): 1. RNN (Score: 5) 2. MLP (Score: 5) 3. LSTM (Score: 5) 4. ARIMA-Rolling (Score: 5) 5. SVR (Score: 5) 6. SVR-AFS (Score: 5) 7. TES-HW (Score: 3) 8. SES (Score: 2) 9. DES (Score: 1) 10. SES-HW (Score: 1) 11. DES-HW (Score: 1) 12. ARIMA (Score: 0)